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Abstract 

Traditionally, communication networks are modeled and analyzed in terms of information flows in 
graphs. In this paper, we introduce a novel symbolic approach to communication networks, where the 
topology of the underlying network is contained in a set of formal terms from logic. In order to account 
multi-user or dynamic networks, we allow distributed coding functions, which represent the same node in 
different situations. For any choice of coding functions we associate a measure of performance referred 
to as the dispersion. We thus show that many communication problems can be recast as dispersion 
problems in this setup. For a large class of measures containing the dispersion, we first show that the 
maximum flow of information transmitted to the users is asymptotically equal to the min-cut of the term 
set, which represents the number of degrees of freedom of that term set. On the other hand, there are 
information measures for which the maximum flow may not attend the min-cut. Second, we derive a 
general principle for many-to-many cast communications in dynamic multi-user networks. We prove that 
if each demand can be satisfied locaUy, then they can all be achieved globaUy, which happens when the 
respective min-cuts satisfy the demands. 

I. Introduction 

This paper is concerned with information flows in communication networks, for which we derive 
analogues of the max-flow min-cut theorem, a classical result for flows of ordinary commodities. For the 
max-flow min-cut theorem to hold, the entire network topology is supposed to be known, and each node 
is assumed to act independently of others. However, these two assumptions may not be vaUd for our 
context. Indeed, the variable network topology and/or the multi-user aspect of communications lead us 
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to define distributed functions, which can be viewed as multiple instances of the same node in different 
situations. These distributed coding functions appear to be in fact the core of the problem for multi-user 
information flows. 

Unlike ordinary commodities, where transported items cannot be modified, information messages can 
be copied or mixed. This idea goes beyond traditional routing and lays the foundation of network coding 
ifDl . |[2l . where the intermediate nodes combine the packets they receive and retransmit the combined 
versions towards their destinations. Network coding is known to outperform routing in the case of multiple 
destinations lO. In order to deal with distributed coding functions, network coding in general and a new 
protocol called dynamic routing in particular can asymptotically bridge the gap between a network with 
distributed coding functions and the same without any distributed coding functions. 

The max-flow min-cut theorems we derive for multi-user communication networks differ from the one 
for ordinary commodities. Indeed, we show that although there is a max-flow min-cut theorem for some 
choices of information measure (the so-called dispersion), we also prove that there are finer information 
measures (the so-called Renyi entropy) which capture different information bottlenecks and thus may not 
necessarily reach the min-cut . 

A multi-user communication problem is given by a network with prescribed sets of sources and 
of destinations, where the destinations request messages sent by the sources. The general problem of 
determining whether all the demands of the destinations can be satisfied at the same time (i.e. whether 
the problem is solvable) is crucial, and as such has been widely studied H, O since the discovery of 
network coding . The problem of solvability of multi-user communications exhibits many pathological 
examples, for instance there exist communication problems which are asymptotically solvable but not 
solvable for any finite alphabet [6j| . Different methods have been proposed to determine whether a problem 
is solvable, including graph entropy f5] and guessing games [7|. 

In this paper, we represent a communication problem via a set of terms, which are concepts from logic. 
A term is built on variables representing the messages sent by the sources, and on function symbols 
representing coding functions at the intermediate nodes. A term thus formally represents all the possible 
operations undergone by messages from the sources to the destinations. This novel representation has 
several advantages. First, the topology of the network is contained in the term set; we can hence work 
without the help of the adjacency matrix of the network. Second, this graph-free framework makes 
computations easier to handle. Third, it is versatile and allows to convert a dynamic multi-user network 
process into a static system. Fourth, it is actually more general than network coding, and hence offers 
not only a generalization of results in network coding, but also a reformulation in terms of flows. 
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We define the min-cut of a term set, which can be viewed as the number of degrees of freedom of 
the set and hence represents the information bottlenecks on the network. Conversely, to each choice of 
coding functions we associate a flow value, referred to as the dispersion, which quantifies the amount 
of information sent to the receivers. More precisely, the dispersion is the logarithm of the number of 
possible outputs of the term set, while the one-to-one dispersion is the logarithm of the number of outputs 
with exactly one pre-image, i.e. for which the input can be completely determined. 

The first main contribution of this paper is the max-flow min-cut theorem for the dispersion of term 
sets: the maximum dispersion and one-to-one dispersion of a term set are asymptotically equal to the 
value of its min-cut. 

The term sets we consider may have distributed coding functions, which happens when different 
subterms use the same coding function. For instance, distributed coding functions occur in the term 
set associated to a multi-user communication problem, where a distributed function represents the same 
intermediate node in terms received by different users. Our proof of the max-flow min-cut theorem is 
based on a novel protocol, referred to as dynamic routing, which uses dynamic headers to eliminate 
distributed functions. Clearly, this comes at a cost in bandwidth equal to the size of the header; however, 
this is a constant given by the term set and becomes negligible when the alphabet size increases. Dynamic 
routing is interesting in its own sake, for unlike typical network coding approaches, such as random linear 
network coding HI, the manipulation of data is operated on headers only, and not on the whole packets. 

If all sources are cooperative and can choose the optimal input distribution, then the maximum amount 
of information that can be inferred about the input from the received output is given by the min-cut of 
the term set. We thus introduce different measures of performance based on the Renyi entropy for the 
non-cooperative case where the inputs are uniformly distributed. We show that the dispersion is a special 
case of the Renyi entropy, while the one-to-one dispersion is an independent performance measure. 

The second main contribution is the max-flow min-cut theorem for the Renyi entropy with order 
< a < 1, thus strengthening the result for the dispersion. Conversely, the Renyi entropy for a > 1 does 
not necessarily reach the min-cut. Therefore, the Renyi entropy is sensitive to information bottlenecks 
that cannot be taken into account via the min-cut approach. However, the case of the Shannon entropy, 
where a = 1, remains open. 

In order to simplify the combinations operated at each intermediate node and the decoding at each 
destination, linear network coding only considers linear coding functions HOl, III]- It is known that linear 
network coding is not optimal in general |[T2l . ifTSl ; we generalize the inefficiency of linear network coding 
in our framework. In particular, we design a family of term sets with arbitrarily large min-cut where the 
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maximum dispersion achieved by linear functions is only equal to 2. This can be intuitively explained by 
the fact that the dispersion of linear coding functions is equal to their min-entropy, which is the Renyi 
entropy of infinite order. Conversely, we prove that if maximum dispersion equal to the min-cut can 
be achieved using coding and decoding functions based on polynomials of fixed degree, then it can be 
achieved using linear functions only. 

The third main contribution is the multi-user max-fiow min-cut theorem. This shows that the maximum 
dispersion received by each receiver (user) can be asymptotically attained simultaneously. In other words, 
if a dispersion can be achieved locally, i.e. while disregarding the other users, it can be achieved globally, 
i.e. when the other users have to be accommodated as well. This result is then applied to multi-user 
communcation problems such as satellite communication (the well-known butterfly network) and data 
storage. 

Finally, our framework based on term sets is extended to simulate dynamic networks whose topologies 
may change over time. We view a dynamic network as possible "worlds", i.e. states in which the network 
is, and we allow the users to have requirements on the dispersion that change over time. A dynamic 
network can thus be modeled as one main term set, viewed as the union of all term sets for all users, 
possible worlds, and time-slots. Our last main contribution is the multi-user theorem for dynamic networks, 
which proves that if the demand (over all worlds and all time-slots) of each user can be satisfied locally, 
then they can all be satisfied globally. 

The rest of the paper is organized as follows. Section |ll] reviews some key concepts of logic and 



term sets and defines the analogues of flows and cuts in the new communication networks. Section III 
then proves the max-fiow min-cut theorems for the dispersion and the one-to-one dispersion for these 
networks. The theorem for the Renyi entropy is given in Section IV Section |V] then investigates the 



dispersion of linear coding functions. In order to illustrate the concepts and results of this paper, a case 

we associate a term set to a 



study of a simple term set is carried out in Section VI In Section VII 



multi-user communication problem, and prove the multi-user max-fiow min-cut theorem. Our model is 



finally generalized to dynamic networks in Section VIII Section IX then concludes and offers some 
problems for future research. 

II. Communication networks based on logic 

This section introduces a new type of communication networks based on term sets in logic and 
determines its main characteristics. We first review the basic concepts of logic and determine the analogue 
of a min-cut. We then view flows as transmission of data over a given alphabet, hence determining the 
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analogue of max-flow. 
A. Term sets 

Let X = {xi, X2, • • • , Xk} be a set of variables and consider a set of function symbols {/i, /2, . . . , fi} 
with respective arities (numbers of arguments) di,d2, ■ ■ ■ ,di. A term is defined to be an object ob- 
tained from applying function symbols to variables recursively. For instance, if A; = 2, / = 3, and 
the arities are given by di = 1, d2 = d-^ = 2, then the following are terms: ti = /2(/i(2;i), X2), 



t2 = fiyf3{f2{x2,xi), f3{xi,X2))j, ts = fi{xi). For a broader introduction to first-order logic see for 
example lfT4ll . Only Definition 3.2 on p. 18 which introduces the notions of variables, function symbols, 
and terms is relevant for us. 

We say that u is a subterm of t if the term u appears in the definition of t. For instance, is a 
subterm of ti as ti = /2(t3,X2), but it is not a subterm of t2- Furthermore, u is a direct subterm of t if 
t = fj{vi, . . . ,u, . . . ,Vfi.), and fj is referred to as the principal function of t. 

We shall consider finite term sets, typically referred to as F = {ti,t2, ■ ■ ■ ,tr} built on variables 
xi,X2, ■ ■ ■ ,Xk and function symbols fi, f2, ■ ■ ■ , fi of respective arities di,d2, ■ ■ ■ ,di. We denote the set 
of variables that occur in terms in F as F^ar and the collection of subterms of one or more terms in F 
as Fsub; thus Fvar ^ Tsub and F C Fsub- 

We now define a term-cut, which can be viewed as replacing some subterms in the definition of a term 
by variables. 

Definition 1 (Term-cut): A set of subterms si, S2, ■ ■ ■ , Sp E Tguh provides a term-cut of size p for F if 
all the terms can be expressed syntactically by applying function symbols to si, S2, ■ ■ ■ , Sp. 

A minimal term-cut for F is a term-cut with minimum size, referred to as the min-cut of F. The min-cut 
can hence be viewed as the number of degrees of freedom of the term set. Clearly, the min-cut is no 
more than the number of variables k since {xi,X2, . . . , x^} is a term-cut for F; similarly, the min-cut is 
no more than the number of terms r. 

Example 1: Consider the term set 

Ti = {h{f{x, y),g{z, w), f{y, x)) ,m{g{z, w),f{y, x)) ,g{f{x, y),g{z, w)) , f{g{z, w), f{y, x)) } , 

then the subterms si = f{x,y), S2 = g{z,w), and S3 = f{y,x) form a term-cut for Fi since we-in a 
purely syntactical way-can express the terms in Fi by applying function symbols to si, S2, and S3 as 

Fi = {/l(si,S2,S3),m(s2,S3),5(si,S2),/(s2,S3)}. 

The concepts explained so far can be graphically explained as follows. 
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f(x,y) f(y,x) g( z,w) 




Fig. 1. The graph Gpi- 

Definition 2 (The graph Gr )■' For a given term set F, the directed graph Gr = {V, E, S, T) is defined 
to have vertex set V = Fsub» edge set E = {(u, '. u is CL direct subterni of f }, source set S — Fvar> 
and target set T = F. 

In the graph Gr, each term is connected to all the variables it is built on; however, the graph is clearly 
acyclic. Notice that 5 n T is non-empty if F contains one or more terms that are variables. 

Example 2: Consider the term set Fi in Example [T] The graph Gr^ consists of a vertex for each 
subterm in 

Fi,sub = {x, y, z, w, f{x, y), f{y, x),g{z, w), h{f{x, y),g{z, w),f{y, x)) ,m{g{z, w), f{y, x)) , 
g{f(.x,y),g{z,w)),f{g{z,w),f{y,x))} . 

Furthermore, each variable in Fi var = {x, y, z, w} represents a source node and each term in Fi represents 
a sink (or target) node. The graph Gr^ is then given as in Figure [T] 

Assume that G is a directed graph with source set S and target set T. We say a set U of vertices is a 
vertex cut — commonly referred to as a separating set — if the removal of U leaves no directed path from 
5 to r. If 5 n T 7^ each single point in 5 n T is considered to be a path from 5 to T. Proposition [T] 
below shows that term-cuts for F are equivalent to vertex cuts in Gr- 
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Proposition 1: Assume F is a finite term set. A subset C C Fsub is a term-cut for F if and only if C 
is a vertex cut that separates S = Fvar from T = F in the directed graph Gr. Therefore, the min-cut of 
F is identical to the size of the minimal cut that separates S from T in the directed graph Gr- 

The proof of Proposition [T] is given in Appendix |A] 

According to the directed graph version of Menger's theorem [15], there exists a family P of vertex- 
disjoint directed paths from S = F^ar to T = F and a vertex cut C which consists of exactly one 
vertex from each path in P. Moreover, given the term set F, it is computationally feasible to find the 
exact value of the min-cut. By use of Dinic's algorithm lfT6l for finding max flows in networks with 
unit capacities-which terminates in 0{Vy/E) time-a term-cut of minimal size for any term set F can be 
returned in time 0(|FsubP)- 

We would like to emphasize that any term-cut C C Fsub for F is always a vertex cut that separates 
S from T when Gr is considered as a directed graph. This is due to the antisymmetric nature of the 
subterm relation. Example [3] illustrates this distinction. 

Example 3: Consider the term set 

)j{f{z)),l{z)]. 

The graph Gr has vertex set V = ^x,y, z, f{z),l{f{z)) , g{f{z),y) ,h(^g{f{z),y) ,x^y source set S = 
{x,y,z} and target set T = (h(g(^f{z),y),x\l[f{z)),l{z)\ and is displayed in Figure 2 a. 

Viewed as a directed graph, Gr has a vertex cut (i.e. a term-cut for F) G = {h{g{f{z) y),x) , z} of 
size 2 and contains only 2 vertex-disjoint directed paths from S to T. Viewed as an undirected graph, 
Gr has a minimal vertex cut of size 3 and contains 3 vertex-disjoint undirected paths from 5 to T (see 
Figure |2]b). 
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B. Dispersion and one-to-one dispersion 

So far, we have treated function symbols as abstract entities; we now assign them expUcit values. 

Definition 3 (Interpretation): Let A be a finite set with |^| > 2, referred to as the alphabet. An 
interpretation for T over A is an assignment of the function symbols ijj = {/i, /2, • • • , /;}, where /j : 
A* ^ A for all 1 < f < I. 

Once all the function symbols /j are assigned coding functions fi, then by composition each term 
tj E r is assigned a function ij : ^ A. In order to simplify notations, we shall write functions by the 
way they map a tuple a = (ai,a2, . . . ,afc) E A^, and we typically write tuples in bold face. We shall 
abuse notations and also denote the induced mapping of the interpretation a.^ if) : A^ ^ A^ , defined as 

ip{a) = (fi(a),t2(a),...,tr(a)). 

Note that the definition of the induced mapping depends on the ordering of terms in T. However, our 
performance measures for interpretations and induced mappings will not depend on a particular ordering. 

Example 4: Consider Ti introduced in Example [T] and let A = F2. The interpretation %Ij = {/, g, h, m} 
given by /(ai, 02) = ai, g{ai, 02) = ai + 02, h{ai, 02, 03) = 0203 + 1, m(ai, 02) = 0102 induces the 
mapping 

ip{ai, 02, 03, 04) = ((03 + 04)02 + 1, (03 + 04)02, oi + 03 + 04, 03 + 04) . 

We are especially interested in how tj; disperses its outputs, and how much information about the inputs 
can be obtained from the outputs. For any b E A^, we denote the pre-image of b as pre(b) = {a. ^ A^ : 
= b}. The image and the one-to-one image of ip are respectively defined as 

image(^/^) := {b E : |pre(b)| > 1} , 
one(^) := {b E : |pre(b)| = 1} . 

We now define the analogue of the value of a flow for information transfer on networks based on logic, 
which we refer to as the dispersion. 

Definition 4: The T-dispersion and one-to-one T-dispersion of an interpretation t(; for T over A are 
respectively defined as 

7('0) := logm |image(?/')|, 
7one(V') := log|^| |one(V')|. 
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We remark that since one('i/') ^ image(V'), we have 7onc(V') ^ lii^) for interpretations ip. For 
instance, the interpretations in Example |4]has Ti-dispersion of log2 6, while ^p{ai + (0, 0, 1, 1)) = V'(^) 
for all a. £ A'^ implies it has one-to-one Ti -dispersion — oo. 

We finally define the (one-to-one) dispersion of F over A as the maximal F-dispersion (one-to-one 
F-dispersion, respectively) over all interpretations for F over A, and we denote this value by 7(F, 
(by 7onc(ri respectively) as this quantity clearly depends on A via its cardinality only. 

III. Max-flow min-cut theorem for the dispersion of term sets 

The main purpose of this section is to prove the following max-fiow min-cut theorem for the dispersion 
and the one-to-one dispersion of term sets. 

Theorem 1 (Max-flow min-cut theorem for dispersion): Let F be a term set with min-cut of p, then 
for any alphabet A, 

7one(r,|A|)<7(F,|A|)<p. 

Conversely, 

lim 7one(r,|A|)= lim ^{TM\) = P- 
The first part of the max-fiow min-cut theorem is easily proved. 

Lemma 1: Let F be a term set built on k variables and with min-cut of p < k. Then for all A, 
7one(r, \A\) < 7(F, 1^1) < p. Furthermore, if p < k, then 7onc(r, |^|) < log|^|(|^|^ - 1) < P- 

Proof: Let C be a minimal term-cut for F. C can be viewed as a term set, hence let ipc be an 
interpretation for C over A. The size of the image of its induced mapping is at most \A\p. Furthermore, 
let tpr be an interpretation for F over A. Since all terms of F can be expressed as functions of elements 
of C, the size of the image of il^r is at most that of tpc^ hence |image(V'r)| < 1^1'' and 7(F, \A\) < p. 

Furthermore, if p < k, the average number of pre-images per element of image(^r) is at least | > 
1. Therefore, there exists an element with more than one pre-image, and |one('i/'r)| < l^l'' — 1- ■ 

A. Diversifled term sets 

We first prove the max-fiow min-cut theorem for the dispersion in the specific case where each subterm 
has a distinct function symbol. More specifically, we define the diversified term set by assigning a new 
function symbol to each subterm that is not a variable. 

Deflnition 5 (Diversifled term set): For any term set F, the diversifled term set V^" is built on the 
same variables as F and its function symbols are obtained by replacing the principal function g of any 
u G Fsub\rvar by a new function symbol gu of the same arity as g. 
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Example 5: Recall the term set Ti from Example [TJ 

Ti = {h{f{x, y),g{z, w), f{y, x)) ,m{g{z, w), f{y, x)) ,g{f{x, y),g{z, w)) , f{g{z, w), f{y, x)) } , 
then 

rf" = { hh(f(x,y),g(z,w)J{y,x)) {ff(x,y) ix,y), gg{z,w) (ZjUj), ff(y,x) iU^x)), 
n^m{g{z,w)J{y,x)) {9g{z,w) {z, w), ff[y^x){V, x)) , 

9g{f{x,y),g{z,w)) {ff{x,y) (x, y),gg[z,w) (z, w)) , f f{g(z,w)J{y,x)) {9g(z,w) {z, w), ff^y^^) iV, x)) } . 

We can simplify the indices and rewrite the diversified term set as 

{h{fi{x, y),gi{z, w),f2{y, x)) ,m{gi{z, w), f2{y, x)), g2{fi{x, y),gi{z, w)) , /s (51(2;, w),f2{y, x)) } . 

We remark that before diversification, the same function symbol may be assigned to different subterms 
(e.g., f{x,y) and f{y,x) have the same principal function symbol in Fi). However, after diversification, 
there cannot be such overlap, as each subterm is assigned a distinct principal function. By definition, it 
is easily seen that the graph Gpdiv is isomorphic to Gr- In particular, F and V^" have the same min-cut. 

For diversified term sets, maximal dispersion can be achieved via routing, which is defined in a similar 
way to the case of ordinary networks. Let T = {ti,t2, ■ ■ ■ , tr} be built on the variables {xi, X2, ■ ■ ■ , x^} 
and have min-cut of p. Let P be a set of p vertex-disjoint paths from Fvar to F in Gr which, without 
loss, start in xi, X2, . • • , and end in ti,t2, . . ■ ,tp, respectively. 

Definition 6 (Routing): A distinct function symbol gy is associated to each subterm v G T^uh- If Uj is 
the direct subterm of v on the same path, then we let gv(ai, 02, ... , a^) = aj. Otherwise, i.e. if v does 
not belong to any path in P, then gi,(ai, 02, • • • , Od) = 1. 

Note that our definition of routing depends on the set of paths P, and hence is not unique. However, 
the dispersion and one-to-one dispersion of routing do not depend on the choice of P. It is straightfor- 
ward to verify that using routing, all points of the form (ai, 02, • . • , Op, 1, • . • , 1) € are mapped to 
(ai, 02, . . . , Op, 1, . . . , 1) E A^, thus yielding a F-dispersion of p. Furthermore, when p = k, the induced 
mapping (restricted to the first p coordinates) becomes the identity on and hence 7one(F, \A\) = p. 
However, routing has one-to-one dispersion —00 when p < A;. In order to thwart this drawback, we define 
one-to-one routing below. 

Definition 7 (One-to-one routing): Let w be a subterm of the form v = gv{ui, U2, ■ ■ ■ , Ud), and denote 
the set of arguments equal to variables Xp^i,Xp-^2, ■ ■ ■ iX^ as Ui^jUi^, ■ ■ ■ ,Ui^. We define the coding 
function g^ : A'^ ^ A as follows. If a path in P goes through v, denote the direct subterm of v on the 
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same path as uj; then, if Oj^ = a^^ = . . . = a^^ = 1, we let gy{ai,a2, ■ ■ ■ ,ad) = aj. Otherwise, let 
gviai,a2, ...,ad) = l. 

With one-to-one routing, it is straightforward to check that the (|^| — 1)'' points of the form 
(ai, 02, . . . , Op, 1, . . . , 1) & with oi 7^ 1, 02 7^ 1, ap 7^ 1 are mapped in a one-to-one fashion to 
(ai, 02, . . . , ap, 1, . . . , 1) G A^, thus yielding a one-to-one F-dispersion of at least plog|^| (|yl| — 1). 

We obtain the following max-fiow min-cut result for diversified term sets. 

Proposition 2: Assume F is a term set built on k variables and with min-cut p. Let A be an alphabet of 
size \A\ > 2, then 7(r'^'^, \A\) = p, and it is achieved by routing. Furthermore, if p = k, 7onc(F'^'^, |^|) = 
p is achieved by routing, while if p < k, one-to-one routing yields 

plog^^m - 1) < 7one(r'i-, 1^1) < logd^r - 1). 



B. Dynamic routing 



The construction of coding functions in Section III-A used the fact that each subterm v was assigned 



a distinct function symbol. However, in general distinct subterms might be assigned the same function 
symbol (e.g., f{x,y) and f{y,x)). The proof of the general case relies on dynamic routing, defined 
below. 

For 1^1 > |Fsub|, there exist two sets B and R with 1 < < |Fsub| such that \ A\ = |(Fsub x B)UR\ 
where the union is disjoint. We shall abuse notation slightly and assume A = (Fgub x B) U R. By 
construction, a tuple a = (oi, 02, • • • , fld^) S A'^^ either has an element in R or has each aj = {ui,bi) E 
Tsub X B. 

Definition 8 (Dynamic routing): Consider the term set F'^'^ first, which contains one function symbol 
gy for each subterm v E Fsub- Select coding functions g^ over B using routing, as in Definition [6] We 
then define the functions fj{ai,a2, . . . ,ad-) over A as follows. If each Oj is of the form Ui = {ui, hi) E 
Tsub X B, let s denote the term s = fj{ui, U2, ■ ■ ■ , u^-)', then if s E Fgub 

fj{ai,a2, ■ . . ,ad,) = {s,gs{bi,b2, . . ■ ,bd^)) E F^ub x B. 

Otherwise, let fj{ai, 02, • • • , Od^) = r for some r ^ R. 

We can similarly define dynamic one-to-one routing. Remark that the headers Ui of the inputs then 
indicate to the coding function fj which subterm v it is located on, and hence which function g^ to use. 
We say an input message aj for the variable xj is correctly formatted if it is of the form [xj, bj) where 
bj E B, and we denote the set of all correctly formatted inputs as / := {a E (Fvar x B)'' : aj = {xj, bj)}. 
Moreover, the set of correctly formatted outputs is denoted as J := {a E (F x By : aj = {tj, bj)}, and 
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for all a G J, we denote the data part of a as 6(a) = (61, 62, • • • , K) S B^- The idea behind dynamic 
routing is that if all inputs are correctly formatted (i.e. have the correct headers) then the coding functions 
fj mimic the behavior of the routing functions (jy. Thus, correctly formatted messages in / are mapped-in 
a one-to-one fashion if one-to-one routing is used-to correctly formatted outputs in J (as long as they 
are mapped by the functions g^), while other messages will be mapped to an "error message" in R. We 
obtain the following lemma. 

Lemma 2: Let ijjhe n dynamic routing interpretation for T over A based on the routing interpretation 
(/) for r*^^^ over B, then {a G J : 6(a) G image(i;^)} C image('0)- Similarly, if Vone is a dynamic 
one-to-one routing interpretation for V over A based on the one-to-one routing interpretation (/)one for 
pdiv Qygj. jj^gj^ {a G J : 6(a) G one((/)onc)} ^ one(V'one)- 

Lemma [2j together with Proposition [2] gives a lower bound on the dispersion and one-to-one dispersion. 
By choosing an appropriate alphabet size, we can prove the following quantitative version of the max-flow 
min-cut theorem for the dispersion. 

Theorem 2: Let F be a term set built on k variables and with min-cut of p. For e < p, let ni := 
|^sub|^/^(l-|^sub|'-''/^)-''/^ Then for all \A\ > m, 7(F, |^|) > p-e and if p = k, 7one(r, |yl|) > p-e. 
These are achieved by dynamic routing. 

Moreover, for e < i+i^^J^-^, let n2 := |Fsub|''/'(l - 2|Fsub|^"^''')"^/'- then for all |^| > ^2, if 
p < k, 7one(r, \ A\) > p — e is achieved by dynamic one-to-one routing. 

Proof: We only prove the case involving n2, the other being proved similarly. Suppose A is an 
alphabet with \A\ > n2 and let ip be the mapping induced by dynamic one-to-one routing for F over A. 
By Lemma [2] and Proposition [2j |one(V')| > {\B\ — We have 

\A\^-'/P - iTsubl V \A\ J ~ iTsubl V iTsubl^/V " 
where the successive inequalities follow from \B\ > jf^\^\ — ^, n2 > |Fsub|''''"^> and the definition of n2, 
respectively. Thus 7onc(r, |^|) > log|^|(|S| - ly > p - e. ■ 

IV. Max-flow min-cut for the Renyi entropy 

A. Renyi entropy of an interpretation 

Let F be a set of r terms built on k variables, and let ip be an interpretation for F over an alphabet 
A. Once A and ijj are fixed, the flow of data from the inputs a G A^ to the outputs '4'{si) G A^ can be 
viewed as the transmission of a random variable a taking values in A^ through the deterministic channel 
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operating the induced mapping Its capacity is easily computed: denoting the mutual information 
between two random variables X and Y as I{X; Y), we have 

:= supl(a; V-la)) 

a 

= sup/i-(V^(a)) =7(V^), (1) 

a 

where H denotes the Shannon entropy. The maximum is reached when a has the following probability 
distribution: for each b € image('i/'), select a(b) G pre(b) and let P{a = a(b)} = |image(V')|~^- Eq. ([T]l 
shows that the capacity of the channel is given by the dispersion of the interpretation considered. This 



justifies our study of the dispersion in Section III Thus, the max-flow min-cut theorem for the dispersion 
states that the channel capacity asymptotically converges to the min-cut p of the term set, i.e. 

supC^ = sup7('0) = p. 

A,i) A,ip 

We note that the capacity is achieved for a specific input random variable which is not uniformly 
distributed over all inputs. This represents the capacity achieved when the sources are cooperative and 
agree on a coding scheme for the input. We are now interested in the case where the sources are non- 
cooperative, and as such we assume that the inputs are uniformly distributed over A^. 

This opens the question of the most accurate measure of performance for a term set. If the input 
a G yl'^ is uniformly distributed, then V'(a) is a random variable with values in , where for all h ^ A^ 

The normalized Renyi entropy over A of order < a < oo of the random variable il){a), which we will 
simply denote as Ha{'4>), is thus given by ||9| 

= " (A:--log|^| |pre(b)|"| < a < 1 or 1 < a < oo. 
" ~ ^ V " beA- / 

Three further special cases need close attention. 

First, when a = 0, the Renyi entropy is the logarithm of the cardinality of the number of outcomes, 
which is often referred to as the Hartley entropy. In our case, the Hartley entropy of ij; is its dispersion, 
which is equal to the channel capacity: 

Ho{^) := log|^| |{b G : pb > 0}| = ^{^). 



March 2, 2011 



DRAFT 



14 



Second, we remark that log^ |pre(b)| is the uncertainty about the input when the message b is 
received. Hence the variable k — log|^| |pre(b)| is the amount of information (counted in symbols in A) 
that can be inferred about a from b = iIj{sl). The Shannon entropy, obtained when a = 1, is therefore 
the expected amount of information inferred from the term set about the input messages: 

Hiiip) := - XI ^'blog|^|Pb = IE{A:-logm|pre(b)||. 
beA- 

Third, when a = oo, the min-entropy quantifies the amount of information that can be inferred from 
any output, by considering the point h ^ with the most pre-images: 

Hooii^) := - log|^| maxpb = mm |/c - log|^| |pre(b)|| . 

Note that there exist interpretations ipi and 1^2 such that -ffo(V'i) > -f^o(V'2) and yet Ha{ipi) < Ha{tp2) 
for some a > 0. Therefore, having the highest dispersion does not guarantee to perform well for the 
other measures. 

Although the dispersion is a special Renyi entropy. Proposition [3] below shows that the one-to-one 
dispersion cannot be viewed as a Renyi entropy. Even more strikingly, the second statement shows that 
the one-to-one dispersion can actually conflict with the other entropy measures. This can be intuitively 
explained as follows. The Renyi entropies measure to which degree the inputs have been mixed by the 
induced mapping t(;, for instance a high min-entropy guarantees that not too many inputs have been 
mapped to the same output. On the other hand, in order to guarantee a high one-to-one dispersion, the 
mixing has to be controlled so that one output is reserved for each input in the one-to-one pre-image. 
This control may significantly reduce the entropy if the size of the one-to-one image is very close to the 
total number of images, which occurs necessarily if the former is very near |j4|^. 

Proposition 3: First, for any term set F, there exists an interpretation ^ for T over any alphabet A such 
that 7one(V') < Hoo{ip)- Second, let Fi = be a term set with min-cut 1. Then for any a > 0, 

there exists an interpretation tp for Fi such that 7one(V') > Ha{ip)- Furthermore, if an interpretation tp 
for Fi has maximal one-to-one Fi -dispersion, then Hi{ip) tends to zero for large \A\. Conversely, if 
Ha{ip) = 1 for some a > then 7one(^) = —00. 

Proof: First, if all the coding functions are constant, 7one(V') = —00. Second, one-to-one routing 
has maximal one-to-one Fi-dispersion equal to log|^|(|^| — 1) which tends to 1, while its Renyi entropy 
is given by 

log|^|{|^| -1 + {\A\^-\A\ + ir} - 
1 — a ' ' 1 — a 
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which tends to l — j^ for a < ^. Conversely, if Ha{'ip) = 1 then all outputs have exactly \A\ pre-images. 



B. Max-flow min-cut theorem for the Renyi entropy 

Similarly to the dispersion, we denote the maximum Renyi entropy over all interpretations for F over 
A as Ha{T, \A\). The max-flow min-cut theorem for the dispersion indicates that the Renyi entropy for 
a = tends to the min-cut of the term set. We shall prove the following result. 

Theorem 3 (Max-flow min-cut theorem for the Renyi entropy): Let F be a term set with min-cut of p, 
then 

lim Ha{T, \A\) = p for alio < a < 1. 

|A|-s>oo 

Conversely, for any a > 1, there exists a term set T with min-cut p for which lim|^|_i.oc Ha{T, \A\) < p. 

The max-flow min-cut theorem for the Renyi entropy with q < 1 is actually based on dynamic routing. 

Proposition 4 (Max-flow min-cut theorem for a <1): Let F be a term set built on k variables and 
with min-cut of p and for all < a < 1 define P = p + and = (2|Fsub|)^- Then for any 

alphabet A with |^| > ns, /?a(F, \ A\) > p — e, which is achieved by dynamic routing. 

Proof: Suppose that dynamic routing is used over an alphabet A (recall the notations from Section 



III-B I. It is clear that any output of the form {{ti, bi), {t2, 62), ... , {tp, bp), {tp+i, 1), . . . , (tr, 1)) € (F x 
B)^ has exactly pre-images, namely those of the form {{xi, bi), {x2, 62), • • • , (a^fc, bk)) G (Fvar x 

B)^, where 6^+2, ■ ■ ■ ,bk & B. Let us denote this set of \B\f outputs as C and compute the Renyi 
entropy of dynamic routing. 

1 — a ' ' ^-^ 1 — a 



\R\ 



a 



a 

bec 



= /)-/31og|^| ||Fsub| + 

> p-/31og|A|(2|Fsub|) 

> p-e. 



The Renyi entropy for large a is sensitive to some types of bottlenecks which cannot be handled with 
the graphic approach. Indeed, we design below a family of term sets for which the Renyi entropy does 
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not tend to the min-cut for a > 1 and arbitrarily close to 1. Therefore, there is no max-flow min-cut 
theorem for the Renyi entropy with a > 1. 

For all /c > 2, we define the set of k'^ terms, built on variables Xj, 0<a,j<k — 1 and on one 
function symbol / of arity k, to be 

Tfc = {tij = f{xl x],xl xj-i) : < i, j < A; - l} . 

Proposition 5 (No max-flow min-cut theorem for a > 1): The term set has min-cut of k"^. However, 
for a > 1^ and all A, 

,^ , .IN i2k — l)a — k ,9 

Ha{Tk, \A\) < ^ '- < k\ 

a — 1 

Proof: First, we prove that has min-cut of k"^ by constructing k"^ vertex-disjoint paths from var 
to Ffc. We have G tjj for all j and if a > 1, e tf,j for any < 6 < /c — 1. Therefore, (x^, tjj) for 
all j and {xj,ta+j mod k,j) for all a > 1 and j form a set of k'^ vertex-disjoint paths. 

Second, we give an upper bound on the Renyi entropy of any interpretation tp for F^ over an alphabet 
A. Consider the set C = {a G A'^^ : = = . . . = a^^^}, then if a G C, Uji^) = Uji^) for all i, i', 
and j and hence only k terms are non necessarily equal. Therefore, |C| = 1^1^^"*^+^ while the size of 
the image of C is at most \A\^. We have 

U — L Ui — i. 



a 






1 




a — 


1 


a 




1 


a 






1 




a — 


1 


a 




1 


a 






1 





bei/>(c) 

< k^—-—\og\^\{\A\^\Ar^^-'y] (2) 

_ {2k-l)a-k 
a — 1 ' 

where (|2]) follows the fact that since a > I, the summation is minimized when all terms are equal. ■ 
The case of the Shannon entropy (a = 1) remains open. This is an important question, as a max-flow 
min-cut theorem for the Shannon entropy would mean that the amount of information obtained in the 
non-cooperative case is asymptotically equal to that in the cooperative case. We would like to highlight 
the difficulty of treating the Shannon entropy case. It can be shown that for any fixed n and for any 
e > 0, there exists < a < 1 such that Hi{X) > Ha{X) — e for any probability distribution X on n 
points. However, we show below that e cannot be chosen independently of n. 

Example 6: For < a < 1 and for n > 2 consider the probability distribution X = Xa,n given by: 

1 — a 

Pl = P2 = ■■■ = Pn-l = 7, Pn = a. 

n — 1 
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We obtain 

Hi{X) = -(l-a)log„|^j— ^1 -alog„a, 

EJX) = + 
It is not hard to show that lim„_j.oo Hi{X) = 1 — a while lim„_j.oo Ha{X) = 1, and hence 

lim lim Hi{X) = ^ 1 = lim lim H^iX). 

a— >1 n— >oo a— >ln— >oo 

In contrast, we remark that the diversified term sets satisfy a much more general max-flow min-cut 
theorem. Indeed, when using routing, the number of pre-images of any output is a constant given by 
l^l'i^^-p We obtain the following result. 

Proposition 6: For any term set F with min-cut of p, any a, and any alphabet A, 

H^{r^'\\A\) = H^r^'WAi) = p. 

Proposition [6] shows that the case of traditional network coding, where distributed coding functions 
are absent, is trivial. This motivates our study of term sets with distributed coding functions, which yield 
different types of bottlenecks that cannot be captured by the typical directed graph approach. 

V. Dispersion of linear coding functions 

A. Insufficiency of linear coding functions 

We now consider the important class of linear coding functions. First, scalar linear functions are defined 
when A is organized as a field Fg for some prime power q, and the coding functions / : FJ^ — Fg can 
be written as /(ai, 02, . . . , a^) = XliLi o^ibi for 61, 62, • • • , £ Fg. A more general class are the matrix 
linear coding functions that are defined when A is organized as a finite vector space V and the coding 
functions f : ^ V can be written as /(ai, 02, • • • , a^) = Yli=i ^i^i where Fi, F2, . . . ^F^ are linear 
maps from V to V. 

Clearly, if the coding functions are linear, then so is the induced mapping of the corresponding 
interpretation. The structure of linear maps allows us to characterize their one-to-one dispersion in 
Proposition |7] below. 

Proposition 7: For any set F of terms built on k variables and with min-cut p and for any linear 
interpretation for F, then 7onc('0) = /O if and only if 7(^) = p = k and 7one(V') = "Oo otherwise. 

Proof: Let A be a vector space of dimension r over a finite field F, and let be a linear interpretation 
for F over A. Since the induced mapping is linear, each point in its image has exactly | ker(?/;)| = \F\'^ 
pre-images, where d = dim(ker(-(/')). 
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If d = 0, then each point in the image of ijj has exactly one pre-image, and hence 7one(V') = Since 
7ono(V') < P by Lemma [T| while p < k, we obtain 7one('0) = k = p. Conversely, it is easily shown that 
if 7(V') = p = k, then 7one(V') = p- If d > 1, then each point in the image of V' has more than one 
pre-image and 7onc(V') = — oo. ■ 

Because linear maps disperse information uniformly, the Renyi entropy of a linear map does not depend 
on the coefficient a and is hence equal to its dispersion: for any linear interpretation ip for F over A, 
we have i/oo(V') = ^q(V') = 7(V') for all < a < cxd. This shows a clear limitation of linear maps, 
as Theorem [3] shows that the min-entropy may not reach the min-cut. In particular, for the term set 
introduced for Proposition [s} the min-cut is equal to k'^ while the min-entropy of any interpretation (and 
hence the dispersion of any linear interpretation) is upper bounded by 2k — 1. 

As Theorem [1] indicates, although it is not always possible to reach the min-cut for any fixed finite 
alphabet, this can be achieved asymptotically. The class of term sets F can then naturally be divided 
into two disjoint classes whether there exist coding functions that achieve perfect dispersion equal to the 
min-cut. If F has perfect dispersion, we also say F is solvable and a solution is an interpretation with 
dispersion equal to the min-cut. Solvable term sets are easily found; conversely, we implicitly proved 
that the term sets F^. are not solvable for all k. 

More generally, the dispersion of a scalar linear interpretation is always an integer, therefore if a term 
set F with min-cut p is not solvable, then the dispersion of any scalar linear interpretation for F is at most 
p—1. However, by the max-fiow min-cut theorem for the dispersion there exist non-linear interpretations 
with dispersion arbitrarily close to p, and hence the highest dispersion may not always be achieved by 
scalar linear interpretations. 

We now significantly strengthen the considerations above by designing a solvable term set for which 
linear functions have dispersion bounded by a constant, while the min-cut can be arbitrarily large. 
Let X = {xi,X2, ■ ■ ■ iXk} be a set of variables and consider k + 1 functions hi of these variables: 
hi{xi,X2, ■ ■ ■ ,Xk) = hi{X). We then define the set F built on the k variables in X and the function 
symbols /ii, /12, . . . , /ifc+i together with / of arity k + 1 and gi,g2, ■ ■ ■ , Qk+i of arities all equal to 1 by 

r = [f{gi{hi{X))M{X). hsiX), . . . , : 1 < i < + 1} . 

Proposition 8: There exists n G N such that for any A with \A\ > n, F defined above is solvable over 
A, i.e. 7(F, |A|) = fc. However, any linear interpretation for F has dispersion at most 2. 

The proof of Proposition [8] is given in Appendix |B] The construction above can be easily generahzed 
to obtain the following result. 
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Corollary 1: For any integers k > I > 2, there exists a solvable set T with dispersion k over all 
alphabets of sufficient large size where / is the maximal F-dispersion that can be achieved by (matrix) 
linear coding functions. 

B. Low-degree solutions 



In Section V-A we showed that there could be a huge difference between the dispersion achievable 
by linear coding functions and non-linear coding functions. In this section we show that if the min-cut is 
achievable by the use of coding functions of low degree (i.e. constant degree independently of the size 
of the underlying field), then the min-cut is actually achievable by the use of linear coding functions. 

More precisely, let T = {ti,t2, ■ ■ ■ , tr} be a term set built on the variables {xi, X2, ■ ■ ■ , x^} and with 
min-cut of p. Let ^ be an interpretation for T over an alphabet A with perfect F-dispersion of p. Then 
decoding functions for t/j are functions di,d2, ■ ■ ■ ,dp ■ A"^ — )■ A such that there exist ii,i2, ■ ■ ■ ,ip for 
which 

(ai,,ai,, ...,aij = (di{ip{a)),d2{ip{a)), . . . , dp{ip{a))y 

Theorem 4 (Low-degree solutions imply linear solutions): Let F be a term set built on k variables and 
with min-cut of p. Assume that there exist coding and decoding functions defined by fixed polynomials 
pi,P2, ■ ■ ■ ,pi G Z[ai, 02, • • . , a/c] with dispersion p for arbitrarily large fields F of characteristic q. Then 
there exist scalar Unear coding and decoding functions over all sufficiently large fields F of characteristic 
q that achieve perfect dispersion of p. 

Proof: Consider a solution ip based on polynomials of fixed degree. Since the coding and decoding 
functions are all polynomials of fixed degrees, their compositions are also polynomials of fixed degrees. 
Suppose q^ is greater than all polynomial degrees and let L be the linear part operator: for any multivariate 
polynomial p over ¥gk, L{p) is the linear part of p. Then it is easy to check that L{piop2) = L{pi)oL{p2) 
if the polynomial pi op2 has degree less than q'^. Let us now consider the interpretation L{'\p), defined as 
taking the linear part of each coding function in ijj. The induced mapping of L{%1)) is then equivalent to 
taking the linear part of the induced mapping of ip. Since dj (V'(a)) = ai. for all j, we have L{dj)oL{'ip) = 
L{dj o il)) = aj, which forms a linear solution with linear decoding functions. ■ 



We remark that the results in Section V-A imply that conversely, for each characteristic q there exist 
term sets with solutions which require that at least some of the involved coding functions (including 
decoding functions) should be given by polynomials of degree at least Example [v] gives a term set 
with no linear solution, and where a solution is given by polynomials whose degrees depend on the size 
of the alphabet. 
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Example 7: Let 

T = {ti = f{f{xi,X2),f{x2,xi)),t2 = g{g{xi,X2),g{x2,xi))} . 

This term set has no (scalar) Unear solution over fields of characteristic 2, yet it has non-linear solutions 
over fields of size divisible by 4. 

Proof: Let / and g be linear functions, i.e. / (ai, 02) = aai + /3a2 and g{ai, 02) = 701 + 6a2, then 
(^15*2) = ((a^ + (7^ + does not depend of 02. Therefore, there are no linear solutions. 

If g' = 2^ where k is even, first remark that the polynomial + u has at most < q roots in Fg, 
hence there exists r G Fg such that r + rv^ 7^ 0. Then let /(ai, 02) = + rog^ and ^(ai, 02) = ai. 
We obtain 

(ti,t2) = ((l + r^+i)ai + (r + r^)a2,ai) 
(ai,a2) = (t2,(r + rv^)-i(ti-(l + rv^+i)t2)). 
Note that the degree of the solution depends on q, and hence Theorem |4] does not apply. ■ 

VL Case study involving a single coding function 

The purpose of this section is double. First, we illustrate the different concepts introduced throughout 
the paper (distributed coding functions, insufficiency of linear coding functions, Renyi entropy, etc.). 
Second, although the results obtained so far are quite tight, we emphasize that for a specific term set, 
more can usually be said: tighter bounds can be derived, other types of functions can be considered such 
as linear functions over rings, etc. 

Throughout this section, we consider the following term set: 

r = {/(x, y), f{x, z), f{w, y),f{w, z)}, 

which actually is Fk introduced in Proposition [5] for /c = 2. The graph Gr is given in Figure [3] below. 
The term set F can be viewed as an abstraction of a many-to-one cast with four sources and one user, 
where the intermediate node / corresponds to a relay which only picks up two signals at a time. 

As seen previously, F has min-cut 4 (F^ar is a term-cut of size 4; conversely (x, f{x, y)), {z, f{x, z)), 
{y, f{w,y)), and {w,f{w,z)) are 4 vertex-disjoint paths from Fvar to F). According to the max-flow 
min-cut theorem for the dispersion, for any given e > we can select a (coding) function f : ^ A 
with F-dispersion at least 4 — e for each sufficiently large alphabet ^. By Proposition |2] the dispersion 
of the diversified term set is 7onc(r'^''', \A\) = 7(F'^''', \A\) = 4 for all 1^41 > 2, and perfect one-to-one 
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X y z w 




f(x,y) f(x,z) f(w,y) f(w,z) 

Fig. 3. The graph Gr- 



dispersion can be achieved by routing for the diversified case. In other words, all demands can be satisfied 
independently. However, Proposition |9] shows that the whole problem does not have any solution. 
Proposition 9: For any alphabet A, 7(r, l^j) < 4. In fact we have the tighter bound 

7(r, 1^1) < 4 - log|^|(l - 2\A\-^ + 3\A\-' - \A\-^) 

Proof: We prove the tighter bound by refining the argument in the proof of Proposition [5] We partition 
the set of inputs A"^ into 4 parts Ci, C2, C3, and C4 defined as follows. The set Ci consist of the inputs a 
where ai 7^ 04 and 02 7^ 03. This set contains — 1)^ points hence the image of Ci contains at most 

— 1)^ points. The set C2 consists of inputs where ai = and 02 7^ 03, and the set C3 consists 
of inputs where ai 7^ and a2 = 03. Finally, C4 is the set of inputs where ai = 04 and 02 = «3- The 
image of C2 contains at most 1^41 (|A| — 1) points (same for C3) while the image of C4 contains at most |^| 
points. Thus the range contains at most | A| - l)^ + 2|^| (|^| - 1) + |A| = - 2|^|3 + 3|Ap - |^| 
outputs. ■ 



We proved in Section V-A that linear maps defined over fields are not optimal when the term set is 
not solvable. We extend this result below for linear maps defined over rings. 

Proposition 10: Assume A is organized as a ring and that f : A^ ^ A is a linear map in the usual 
sense of algebra, i.e. /(ai,a2) = riai + r2a2 for some ri,r2 G A. Then / has T-dispersion at most 3, 
which is achieved for ri = r2 = 1. 

Proof: We first prove the upper bound. Notice that the ring might not be commutative so we do not 
assume that rir2 = r2ri. Then 

ip{ai,a2,a3, 04) = (noi + r2a2, noi + r2a3, na^ + r2a2, ria4 + r2a3) G A'^, 
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which is uniquely determined by the word (rioi + r2a2,r2(a2 — a3),riai + r2as) G A^. We conclude 
that the size of the image of tp is no more than \A\^. This proves the upper bound. For the lower bound 
let /(ai, 02) = ai + 02 and consider the set C Q A'^ of inputs with last coordinate equal to 0; we have 

'(/'(ai, 02, 03,0) = (ai + a2,ai +03,02,03) G A'^ 

and hence the image of C has size \A\'^. Thus / has a F-dispersion equal to 3. ■ 
Similarly, if the alphabet is organized as a group G, we define the coding function fc-G^^Gas 
fc{a,(3) = a/3 for all a,l3 £ G. Then it can be easily shown that fc has F-dispersion of 3 for any 
group G. 



We would like to illustrate the difference between performance measures introduced in Section IV by 
considering A = ¥2. In this case, there are 2^^ = 16 choices for the coding function /. However, it can be 
easily shown that any function is equivalent in terms of the random variable ipia) to one of the following 
four functions: /o(ai,a2) = 0, /i(ai,a2) = «i, /2(«i,«2) = ai + 02, and 73(01,02) = 01O2. We easily 
obtain that /q, /i, and /2 have dispersion 0, 2, and 3, respectively; since they are linear functions, they 
all have one-to-one dispersion equal to —00. On the other hand, the behavior of the non-linear function 
/3 is more complex, as its image consists of 10 elements: the 9 elements of its one-to-one image together 
with the all-zero vector, which has 7 pre-images. Therefore, has dispersion 7(/3) = log2 10 = 3.32 
and one-to-one dispersion 7onc(/3) = log2 9 = 3.17. On the other hand, has Shannon entropy equal 
to Hi{f^) = 4 — ^ log2 7 = 2.77, which is lower than the Shannon entropy of /2. Furthermore, has 
min-entropy of Hodfs) = 4 — log2 7 = 1.19, which is lower than those of fi and /2. 

We now consider the reverse illustration: we fix the coding function, but we change the alphabet on 
which it is defined. By use of computer calculations, we can show that the coding function /(oi,02) = 
(oi — 02)^ + 01 + 02 over the ring Z3 provides optimal F-dispersion of log3 51 = 3.58, which attains the 
upper bound in Proposition [9] The same coding function also provides an optimal one-to-one F-dispersion 
of log3 36 = 3.26. In general, we shall denote the interpretation for F based on the function / defined 
above over as ip\A\ - Using an argument based on the Fourier transform ifTTl . one can fully determine 
the number of elements of the image of that interpretation with a given number of pre-images when the 
alphabet size is a prime number: |yl| = p. Thus the entropy Ha{ipp) can be determined for all a; its 
F-dispersion and one-to-one F-dispersion are respectively given by 

j{iPp) = 4-logp2 + logp(l+p-i -p-2+p-3), 

lonei^p) = 3 + l0gp3 + 2l0gp(l-p-l). (3) 
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Fig. 4. Dispersion and Renyi entropy of the interpretation 7/;^ (p prime) 

The behavior of the Renyi entropy for different values of a and the one-to-one dispersion of tpp is 
displayed in Figure |4] On the other hand, when p tends to infinity, then the Renyi entropy actually 
reaches the upper bound from Proposition |5] A more detailed argument is given in Appendix |C] On 
the other hand, by Q the one-to-one dispersion of tpp tends to 3 on the primes p, which is below the 
min-cut. Thus, the interpretation ipp is asymptotically optimal in terms of Renyi entropy but not in terms 
of one-to-one dispersion. This illustrates the fundamental difference between the one-to-one dispersion 
and the different entropy measures. 

Computer calculations show that the dispersion and one-to-one dispersion of i/jp behave in a slightly 
irregular fashion on composite numbers. We notice for instance that the one-to-one dispersion is surpris- 
ingly large for prime powers, as seen in Figure [5] 

VII. Multi-user max-flow min-cut theorem 

We now consider multi-user communications. To each user 1 < j < m, we associate a term set 
Tj. Each choice of coding functions determines a dispersion of (ri,r2, . . . ,Tm) which is an array of 
dispersions. Before detailing how the term sets correspond to the multi-user communication problem, let 
us derive the max-flow min-cut theorem for a collection of term sets. 

Theorem 5 (Multi-user max-flow min-cut theorem): Let Fi, . . . , F^ be term sets with respective 
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Fig. 5. Dispersion and one-to-one dispersion of the interpretation tpp (p odd) 



min-cuts pi, p2, ■ ■ ■ , Pm- Then for any e > 0, there exists no such that for all A with |^| > no, 
7ono(rj,/Jj) > Pj — e for all I < j < m. Also, for any < a < 1, there exists n such that 
Ha{Tj, \A\) > Pj -e for all |^| > n. 

The proof is based on diversifying variables. Let X = {xi, X2, ■ ■ ■ , x^} be the set of variables on 
which all terms are built. For any I < j < m, let = Xg, • • • , x;^} be a new set of variables, and 
let be the term set obtained from Fj by replacing the variable Xi by xj for all 1 < z < A;. It is clear 
that any interpretation for Tj can be viewed as an interpretation for T^, and that its Fj -dispersion (also 
one-to-one dispersion and Renyi entropy) is equal to its F-^ -dispersion. 

Consider now f := Ujli^-'' where the union is disjoint and with min-cut equal to p = Y^Y=iPj- 
In fact the graphs G-pj are components of Gf, which shows that for any interpretation ip for F, we 
have ^{ip) = 7(V'"')> where ip^ is the corresponding interpretation for F-^. Therefore, the theorem 

is proved by choosing an interpretation ijj for f such that > p — e, hence for any j, we have 

> pj - e- 

A. Multi-user communication problems and term sets 

We now apply the multi-user max-flow min-cut theorem to communication problems. A many-to-many 
cast is defined as follows. 
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Definition 9: A multi-user communication problem instance (also referred to as a many-to-many cast) 
is a tuple (F, E, S, U, A), where 

• G = (y,E) is an acyclic directed graph, where the vertices vi,V2, ■ ■ ■ ,v\v\ of V are sorted such 
that {vi,Vj) G E only if i < j. 

• S = {si,S2, ■■ ■ , Sk} C 1/ is the set of sources, which are nodes with in-degree 0. Without loss of 
generality, Si = Vi for 1 < i < k. 

• U = {ri, r2, . . . , rm} C F is the set of users (receivers), which are nodes with out-degree 0. Without 
loss, rj = v^v\-m+j for 1 < j < m. 

• A is an alphabet of size \A\ > 2. Each source Si sends a distinct message Oj G A, and each user Vj 
requests {ai, 02, . . . , afc}. 

Each vertex Vk G V\S can manipulate the data it receives and transmit a function of its inputs onto 
all its out-edges. 

A communication problem can be equivalently defined with terms. Each user obtains a term built 
on the variables sent by the sources, where the function symbols represent the operations made by the 
intermediate nodes in V\{S U U). More formally, to each source Sj we associate the variable Xj and 
we denote X = {xi,X2, ■ ■ ■ ,Xk}. Each user rj requests all the variables in X. We then associate the 
function symbol /; to all intermediate nodes vi G U U) and each vertex vi is recursively assigned 
the term ui = fi{ui^,ui^, . . . ,ui^), where {vi^,vi2, . . . ,vi^} is the in-neighborhood of vi. We denote the 
in-neighborhood of the user rj as {vj^i,Vj^2, ■ ■ ■ ^vj^p.}. Note that using this notation, it is possible that 
Vj^i = Vj'^i' for distinct and However, this will not affect our definitions below. We finally 
associate the term tj^i to the vertex vj^i. 

A solution for the many-to-many cast instance is a choice of the functions at the intermediate nodes 
such that all the users' demands can be satisfied at the same time. Using terms, user j is satisfied if 
and only if it can recover X from the term set Fj = obtained from its in-neighborhood. This 

impUes that the term set Fj must have dispersion of /c. In order to take into account the fact that all 
demands have to be satisfied at the same time, we use the term set F defined by diversifying variables. 
By our construction and the remarks on F made above, we obtain the following result. 

Proposition 11: A many-to-many cast instance is solvable over A if and only if 7one(rj, |^|) = 
■~f{Fjj \A\) = k for all 1 < J < m or equivalently, 

7one(f , 1^1) = 7(f , 1^1) = km. 
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Fig. 6. Butterfly network. 

The multi-user max-flow min-cut theorem then shows that if the min-cut between the sources and 
each user is equal to k, then the multi-user instance is asymptotically solvable. Note that the equivalence 



individually, then all the users' demands can asymptotically be satisfied simultaneously. 
The following two examples are consequences of Proposition 1 1 

Example 8: In [IJ, Yeung and Zhang made a simple observation that lay the foundation for network 
coding. The authors considered a situation where two users communicate via a satellite. User X wants 
to send a message x ^ A to Y, while Y at the same time wants to send a message y £ A to X. 

The satellite communication problem is equivalent to the communication problem in Figure [6^), referred 
to as the butterfly network. In Figure [6^), the function symbols are affected to the vertices, accordingly 
to Definition [9} however the problem is sometimes represented in the literature by Figure [6]d), where the 
function symbols are affected to the edges. A solution over an alphabet ^ is a function f : ^ A 
with the property that there exist decoding functions di,d2 : A'^ ^ A such that b = /(a, 6)) and 
a = d2{a,f{a, b)) for any (a, b) G A^. 

By Proposition [TT} the satellite problem (and the butterfly communication problem) is mathematically 
equivalent to the following problem on terms. 

Problem 1 (Equivalent to the satellite communication problem): Construct a function f : A^ ^ A 
with F-dispersion equal to 4, where F := {xi, /(xi, X2), /(xs, X4), X4}. 



in Proposition 1 1 together with Theorem [5] indicate that if each user can be asymptotically satisfied 
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Fig. 7. Graph Gr for the butterfly network. 



This term set corresponds to the graph with vertex set 

V = Tsub = {xi,X2,X3,Xi, f{xi,X2), fix3,X4)}, 

source set S = {xi, X2, X3, X4} and target set T = T. The edge set C y x y is given by 

E = {{xi, f{xi,X2)), {x2,f{xi,X2)), {x3,f{x3,Xi)), (^4, /(xs, X4)) } • 

The graph Gr is displayed in Figure [7] 

Problem [T] can be viewed as a communication problem between a single source that transmits a message 
(01,02,03,04) G and a user who receives a message of the form (oi, /(oi, 02), /(oa, 04), 04) G A'^. 

Example 9 (Distributed storage): Assume that we want to store two messages x,y G A at four 
locations. The messages x and y are stored at two of the locations. At the two remaining locations 
two messages f{x,y) G A and g{x,y) G A are stored. The problem is to select the coding functions 
j : A? ^ A and g : ^ A such that it is always is possible to reconstruct x and y from accessing 
only two of the four locations. 

This type of problem has already been studied in the literature ifTSl as part of network coding, as 
well as an application of error correcting codes. The actual problem can be shown to be equivalent to 
the existence of two orthogonal Latin squares of order 1^41 |[T9l . This problem was first posed by Euler 
around 1780 and was eventually completely solved in 1960, where it was shown in ll20l that there exist 
orthogonal Latin squares of any order except of order 2 and order 6. The distributed storage problem is 
mathematically equivalent to the following problem. 
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Problem 2 (Equivalent to the distributed storage problem): Construct two functions f,g:A'^^A 
with T-dispersion equal to 10, where 

r := {xi, f{xi,yi),y2, f{x2, y2),x3, g{x3, ys), Vi, g{xi, yi), f{x5,y5),g{x5, ys)}- 

Proof: We can view the storage problem as a many-to-many cast instance with two sources send- 
ing X and y and six users obtaining {x,y}, {x,f{x,y)}, {y,f{x,y)}, {x,g{x,y)}, {y,g{x,y)}, and 
{f{x,y),g{x,y)} respectively. The demands of the first user are trivially satisfied, so only the last five 



need to be considered. Applying the transformations above and Proposition 11 we obtain the desired 
term set. ■ 

Problem [2] can be viewed as a communication problem between a single source that transmits a message 
a G ^4^" and a user who receives a message ip{a) G A^^. 

We can give a network coding interpretation of the diversified term set. When the term set is diversified, 
then we can select a different solution for each user independently. Therefore, we have full dispersion in 
the diversified case if and only if each request can be satisfied individually. 

Example 10: Consider the term set F = {xi, /(xi, 2:2), X4, /(xs, 2:4)} corresponding to the satellite 
communication problem. The diversified term set can be written as V^" = {xi, /(xi, 2:2), X4, g{x3, X4)}. 
Since the coding functions / and g are not required to be identical, it is easier to find coding functions 
that achieve the maximal dispersion of 4 in the case of V^". One possible choice of coding functions that 
achieve dispersion of 4 is to let f{xi,X2) = X2 and let g{x3,X4) = x^. However, this choice does not 
correspond to any real-life situation for the original satellite communication problem. In fact, diversifying 
the term set is equivalent to considering the case where the satellite has access to two independent channels 
to the stations on which it can send a different message (one constructed by the function /, the other by 
the function g). 

The max-fiow min-cut theorem for the dispersion then shows that if all demands can be satisfied 
individually, then for all large enough alphabets, all demands can be "nearly" satisfied at once. Theorems [T] 
and |2] quantify this statement in terms of a small loss in one-to-one dispersion. However, this small loss 
may be critical when network coding is considered. Indeed let F be the term set associated to a given 
multi-user communication problem, and let ip be the induced function of an interpretation for F. Then, the 
one-to-one pre-image of may contain very few points (a, a, . . . , a) G ^4'^'", where a G A^. However, 
only these points make sense for the multi-user communication, as the variable xj is merely an artificial 
variable representing Xi for all j. Consider the butterfly network for example. Then all the points in the 
one-to-one pre-image of dynamic routing satisfy xi / X3, which does not correspond to any real-life 
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situation for the satellite communication problem. 

VIII. Dynamic networks 

A. Theorem for multi-user dynamic networks 

In the analysis of dynamic communication networks it is natural to take into account that networks 
change over time. Potential network changes including link failures, point failures and noisy channels 
can be modeled by ideas vaguely akin to Kripke's possible world semantics from logic and philosophy 
EH. The idea is to consider a collection of possible worlds which each could become the actual world 
as time progresses. 

A world is not only a representation of the network at a given time, but an expansion over a number 
of time slots. This is a generalization of the butterfly network, which can be viewed as an expansion over 
time of the satellite communication problem. As the model is discrete there are only a finite a number 
\W\ of possible worlds. We can think of each node in a world as a network node at a certain time slot. A 
point in each world thus represents a node at a given time slot, and as such there is a link from each node 
to its successor in time. This link is not a communication link, but represents the data transformation at 
the node in the two time slots. Communications might be instantaneous i.e. connect different nodes in 
the same time slot. The resulting network is acyclic. Formally we define 

Definition 10 (Dynamic network): Let U, W, and T be finite sets. A dynamic network is a collection 
r := {Tu^w,t u ^ U,w € W,t ^ T} of term sets indexed by a user/receiver u G [/, a world w e W, 
and a time-slot t £ T. 

Assume all coding functions that occur in terms in F have been given interpretations. Then we can 
associate a dispersion to each set term set Tu,w,t- For each set ru,w,t we associate a variable 'yu,w,t which 
for each choice of coding functions denotes the dispersion of the term set r„ „, Typically the same 
function symbol might occur in terms sets in multiple worlds. In general a coding function might be a 
good choice for some of the worlds, while it might be a bad choice for other worlds. 

In order to take into account the fact that some worlds may be more likely than others, we consider the 
general case where each user n G [/ is assigned a utility demand D^- This is of the form Du = Fuiju) > 
dem^j, where dem„ is a real number and the utility function F„ is a real-valued, non-decreasing, and 
continuous function in the variables 7^ = {'yu,w,t : w ^ W,t ^ T} representing the received dispersions. 
We say that the demand Du is satisfied locally if it can be satisfied when all other user demands are 
disregarded. Conversely, we say that Di, D2, ■ ■ . , Dm are satisfied globally if they can all be satisfied by 
the same interpretation. 
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Theorem 6 (Dynamic multi-user theorem): In a dynamic many-to-many cast where the demand of 
each user 1 < u < m can be satisfied locally, the demands Di, D2, ■ ■ ■ , Dm can be satisfied globally. 
The same holds for Renyi entropy demands with a < 1 and one-to-one dispersion demands. 

Proof: Suppose that each Fu{^u) > dem^j with \ < u < m can be achieved locally. If we select 
5 > such that 5 < min„{F„(7„) — dem^}, then in fact Fu{^u) > deuiu + 6 with 1 < u < m 
can be achieved locally. Assume that F„(7u) > dem„ + 5 is achieved (locally) by the dispersions 
7u = {'yu,w,t w £ W,t £ T}. Since Fu is continuous, there exists e > such that if the dispersion 
j'^ of {Tu,w,t : w e W,t e T} has Wu,w,t - lu,w,t\ < e, then iFui-y'J - F«(7«)l < ^- According to the 
multi-user max-flow min-cut theorem, for each e > there exists an interpretation (over a sufficiently 
large alphabet) which globally achieves the dispersions 'yu,w,t — e , u € U,w ^ W,t ^ T. Thus there 
exist coding functions such that Fuiju) > dem^ for 1 < u < m. ■ 

We remark that the demands could also be expressed as Fu{'yu) > dem^. In that case, the dynamic 
multi-user theorem indicates that the demands can asymptotically be achieved globally if they can be 



asymptotically achieved locally. Example 11 below shows how the utility function can cover a broad 
family of performance measures. 

Example 11: Let F consist of term sets Tu.w.t where u G U,w ^ W, and t £ T. Assume world w 
occurs with probability pw and the user u has assigned a weight cot proportional to the utility of the 
dispersion achieved in time slot t. Then the utility for each user u £ U can be defined as 

Fui-^u) = ^ Pw^tlu,w,t 
wGW,t£T 

which is a continuous function in the variables 7^. Asymptotically, the maximal utility achievable for 
user u is given by 

Pw^tpu,w,t 

where pu,w,t denotes the min-cut of ru,w,t- 

The example above can be viewed as an asymptotic generalization of the network coding theorem, 
which only considers one time slot and one possible world. A proper generalization of that result 
can be obtained by considering linear coding functions over a a diversified term set for a multi-user 
communication problem. 

B. Clairvoyance and term equations 

In a dynamic network, various unpredictable network changes (e.g. link failures) might happen in 
various of the possible worlds. A realistic choice of coding functions cannot look into the future and take 
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into account which link might fail during transmission. Nonetheless, we define clairvoyance as the case 
where each node "knows" in which of the possible worlds the network is. We can define the formally 
as follows: 

Definition 11 (Clairvoyant coding): The clairvoyant version P'^i^i'' of T is defined as the collection 
of term sets where function symbols have been diversified so function symbols in different worlds are 
distinct, e.g. each function symbol the occurs in term sets with index w G W is assigned an (additional) 
index w. An assignment of functions symbols to f"^^'^^^ is said to consist of clairvoyant coding functions 
for f. 

As another application of the multi-user dispersion theorem we obtain that clairvoyance does not 
improve the performance of a network in terms of dispersion. The proof of Proposition [12] below is 
based on arguments similar to those used to prove the theorem for the dispersion. 

Proposition 12 (Clairvoyance does not increase dispersion): If in a dynamic many-to-many network 
the users demands Di, D2, . . ■ , can be satisfied in f'^i^i'' then they can be satisfied in F. 

It should be noticed that this result for the dispersion and the Renyi entropy is very much in the spirit 
of diversity coding and random linear network coding and is thus not surprising. Indeed, diversity coding 
intuitively deals with link failures and noisy channels by mixing the inputs and transmitting a large 
number of independent messages. Clairvoyance is thus rendered useless. However, our result about the 



one-to-one dispersion is remarkable, for-as seen in Section IV-A-a high one-to-one dispersion involves 
a controlled non-linear mixing, which contradicts the philosophy of diversity coding. 

We finish this section by revealing that a model based on term equations can take into account the fact 
that each user not only requires a high dispersion, but also a certain number of specific messages. Recall 
from Section [v] that if a user receives the terms {ti,t2, ■ ■ ■ ,tr}, we associate the decoding functions 
di,d2, ■ ■ ■ ,ds where we require that di{'iJj{aL)) = ai for all 1 < i < s. This can be more succinctly 
expressed in the term equation Tj = Xi, where r is the term defined as Tj = di{ti,t2, ■ ■ ■ , U)- 

In the case of message demands, then clairvoyance can clearly make a difference, as seen in Example 
[12] below. 

Example 12: Consider the dynamic network depicted in Figure [8} where one of two links is always 
contaminated with pure noise. Remark that this is not the butterfly network in Figure [6] Without using 
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Fig. 8. Dynamic network with message demands. 



clairvoyance, the message demands of the destinations can be expressed as term equations 

(ii(noisei,/(x,y)) = x 

d2{y,f{x,y)) = y 

d3{x,f{x,y)) = X 
(i4(noise2,/(x,y)) = y. 

Here we assumed that the decoding functions can distinguish the messages x and y from noise, which 
is why we can apply different decoding functions to each of the four potential decoding situations. It is 
clear that the message demands of both users cannot be satisfied globally without clairvoyance. On the 
other hand, using clairvoyance, the problem is turned into the following set of term equations: 

cii(noisei,/i(x,i/)) = x 

d2{y,fi{x,y)) = y 

d-i{x,f2{x,y)) = X 

d4{noise2,f2{x,y)) = y. 

Thus, letting fi{x,y) = x for world 1 and f2{x,y) = y for world 2 solves the communication problem 
by an appropriate choice of decoding functions di,d2,d3, and d4. 

In brief, clairvoyance does not help in many-to-many casting where the users have dispersion demands. 
But not surprisingly in general-when different users have different message demands-clairvoyance can 
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greatly increase the performance of the network. 

IX. Conclusion 

There is an extensive literature for dealing with the logistics and scheduling in traditional commodity 
networks. The theories are very diverse ranging from linear programming, algorithms for transport of 
"discrete" goods, game theory, traffic flow theory, network exchange theory, economic network theory, 
packet switching, and queuing theory. Surprisingly, to the authors' knowledge there have been relatively 
few attempts to develop similar theories that cover transport of digital information in communication 
networks. We believe that there is a strong need to develop theories for transport of digital information 
analogous to theories of transport of ordinary commodities. 

In this paper, we considered communication networks based on term sets. These networks are shown 
to generalize typical views of network coding, and offer an interpretation as flows of information. In 
particular, the max-flow min-cut theorem indicates that the maximum amount of information that can be 
transmitted through a network can be viewed as the min-cut of the term set. Also, the use of dynamic 
routing allowed us to virtually eliminate distributed coding functions. These considerations illustrate the 
following gold standard for many-to-many casts, which states that if all demands can be satisfied locally, 
then they can be satisfied globally. This standard can be applied to any situation and to any protocol, 
even routing. 

Gold Standard for many-to-many casts: Assume that the requirements — with the caveats discussed 
in the paper — of each user can be accommodated when all other users demands are disregarded. Then 
it is possible to accommodate all users requirements simultaneously. 

Appendix 

A. Proof of Proposition [7] 

The proof relies on the following two technical lemmas. 

Lemma 3: A proper subterm is a subterm that is not equal to the term. We have the following: 

• a) The subterm relation is transitive, i.e. if ti is a subterm of t2 and t2 is a subterm of t^, then ti 
is a subterm of t^. 

• b) If ti is a proper subterm of t2 and t2 is a subterm of t^, then ti is a proper subterm of t^. 

• c) The parsing of terms is unambiguous. More specifically if two terms ti and t2 are identical, 
they have the same principal function symbol (say /). If we write ti = f{ui,U2,..-,Ud) and 
t2 = f{ui,U2, • • • , u'^) then uj = u'- for j = 1, 2, . . . , d. 
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The proof of Lemma [3] is easy and hence omitted. 

Lemma 4: Assume t is a term and let si, S2, • • • , Sp be a term-cut for t such that Sj is not a subterm 
of Sj for all i / j. Let n be a subterm of t, then there are two exclusive possibilities: 

• i) si, S2, . . . , Sp is a term-cut for u. 

• ii) n is a proper subterm of some Sj, j = 1, 2, . . . , p (u might be a subterm of more than one Sj). 
Proof: We first show that the possibilities i) and ii) are exclusive. Suppose on the contrary that a 

subterm u satisfies both i) and ii). Then some si is a proper subterm of u, and by Lemma [3] it is a 
subterm of some Sj, which contradicts our assumption. 

We now show that at least one of i) or ii) is satisfied. Assume u is chosen as a subterm of t that fails 
to satisfy both i) and ii) and such that it is not the proper subterm of some other subterm of t which 
also fails i) and ii). Since the term t satisfies i), u must be a proper subterm of t. Thus u occurs as a 
direct subterm in some subterm v of t, i.e. v can be written a.?, v = g{. . . ,u, . . .) where g is the principal 
function symbol in v. Since the subterm relation is transitive (Lemma |3] part a), v does not satisfy ii), 
and as f = . . , u, . . .) with u not satisfying i), it follows from Lemma [5] part c, that v also fails to 
satisfy i). But this contradicts the assumption that u was not a subterm of a subterm that failed to satisfy 
both i) and ii). ■ 

We now prove Proposition [T] 

Proof: Assume first that C = {si,S2, • • • , Sp} C Fgub is a term-cut for F. If some term si is a 
subterm of Sj, remove Sj from C. What remains after having repeated this procedure is a set C C 
of minimal subterms in C. Clearly C" is a term-cut for F. To keep the notation simple let us assume 
C = {si, S2, . . • , Sp'}. It suffices to show that C C Fsub is a cut that separates S = Fyar from T = F 
in the directed graph Gy- 

Assume there is a path P from a variable x G Fvar to a term t G F which does not intersect C . Since 
C satisfies the conditions for Lemma|4j each term in P satisfies either i) or ii). Since x satisfies ii) while 
t satisfies i), consider the first subterm u in P that satisfies i). The proper subterm of n in P satisfies ii), 
and hence is a proper subterm of some Sj. Thus u must be identical to that Sj, since otherwise we could 
not express u as a function of the subterms in the cut C . This is not possible since P was assumed not 
to intersect C = {si, S2, . . . , Sp'}. 

To prove the converse, assume that C = {si, S2, • • • , Sp} C Fsub is a cut that separates S = Fvar from 
T = F in the directed graph Gy- Each subterm t that does not belong to the cut C and is on the same 
side of the cut as T has each of its arguments either in the cut or on the same side of the cut as T. A 
simple argument by induction shows that subterms that have all their arguments either in the cut or on 
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the same side of the cut as T can be written on the form g{si,S2-, ■ ■ ■ , Sp). This shows that C defines a 
term-cut for T. 

The fact that the size of a minimal term-cut is identical to the size of a minimal cut follows trivially 
from the first part of the proposition. ■ 

B. Proof of Proposition [S| 

We first consider a term set related to T, where we convert the function symbols hi {X) into variables. 
Lemma 5: Consider the following set of A: + 1 terms built on A; + 1 variables /ii, /i2, • • • , /ifc+i and 
k + 2 function symbols f,gi,g2,..., gk+i- 

r' = {^^ = f{gi{hi), h2, hs,..., hk+i) ■.l<i<k + l]. 

r' has min-cut k + 1, and hence dispersion arbitrarily close to k + 1. However, if the coding functions 
are linear, then they have F'-dispersion at most 2. 

Proof: We first prove that the min-cut of T' is A; + 1 by constructing k + 1 vertex-disjoint paths 
from Tyjjj, to V in Gp'- There is a path (/ii, ii) and k paths {hi,ti) for 2 < i < A; + 1; all these 

paths are clearly vertex-disjoint. By the max-flow min-cut theorem for the dispersion, T' has dispersion 
arbitrarily close to A; + 1. 

Let tp he a. matrix linear interpretation for T', i.e. gi{ai) = Giai for all 1 < i < A; + 1 and /(a) = 
Y^^^i Fiai. Consider the induced mapping ij^ composed with the permutation vr of ^4^+^ defined as 
7r(a) = (ai, a2 — ai, . . . ,0^+1 — ai). Then we obtain 

7roV(a) = ^FiGiai + J^Fiai,Fi(G2-Gi)ai,...,Fi(Gfc+i-Gi)ai^ , 

and |image(7r o ^)| < A^. However, we have |image(7r o ^/;)| = |image(V')|, and hence the interpretation 
has dispersion at most 2. ■ 
We now prove Proposition [8] 

Proof: By Theorem [TJ there exists n such that for any A with \A\ > n, there is an interpretation ip 
for r' over A with one-to-one dispersion above k. Denote the pre-image of one{'ip) as /, then |/| > l^l'^. 
Furthermore, assign the coding functions hi such that the inputs in A'' are mapped to \A\^ elements 
in /. Therefore, the obtained interpretation for F has perfect dispersion of k. On the other hand, since 
the dispersion of an interpretation for F is no more than that of an interpretation for F', any linear 
interpretation for F has dispersion at most 2. ■ 



March 2, 2011 



DRAFT 



36 



C. Entropy of the interpretation %l^p 

Using an idea by Keevash and Sisask based on Fourier analysis ifTTIl . it is possible to show that the 
image of ipp for each prime number p is partitioned into 4 sets 5i, 52, Sp, Ssp^2 where: 

• Si contains 3p{p — 1)^ points with exactly one preimage 

• S2 contains p{p — — 3)/2 points with exactly two preimages 

• Sp-i contains 2p{p — 1) points with exactly p — I preimages 

• Ssp-2 contains p points with exactly 3p — 2 preimages. 

Therefore, the size of the image is ^{p^ + p"^ — p + 1). Moreover, ijjp has Renyi entropy of order a / 1 
given by 

H^ii^p) = ^ logp (3(p - 1)2 + {p- l)\p - 3)2"-i + 2{p -l){p- ir + {3p - 2)") + 

The limit of the Renyi entropy (including the case where a = 1) when the size of the alphabet tends 
to infinity can be shown to be given by 

(a if0<a<2 



lim Haii^p) 



if 2 < a < 00 
3 if a = 00 



This shows that the bound '■^^^j'" ^ is a matching upper and lower bound for k = 2. 
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